-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Router observability (Current QPS, router-side queueing delay, etc) Part 1 #119
Feat: Router observability (Current QPS, router-side queueing delay, etc) Part 1 #119
Conversation
29260ac
to
2ba572e
Compare
@ApostaC @gaocegege @YuhanLiu11 please help me review the first part of the PR. I split into 2 small PRs to make it easier to manage because this feature will require quite a amount of files to be changes if do all-in-one. This would also save time to check the work as well 😃 |
@sitloboi2012 Thank Really appreciate your contribution! @YuhanLiu11 @Shaoting-Feng PTAL 🙏 |
@sitloboi2012 Thanks for your contribution! Can you fix the pre-commit errors? |
69e88fb
to
1261ee4
Compare
hi @YuhanLiu11 @Shaoting-Feng, I ran pre-commit again, please help me check this PR |
The PR seems to contain a lot of unrelated stuff (the changes in previous PRs). Is it caused by the rebase? @sitloboi2012 Can you try fixing this? Maybe cherry-pick your modifications? Or create a new PR based on the diff with the latest main? |
Btw, this is really great work and we appreciate your contribution! Hopefully the PR problem can be fixed easily. |
yep should I will cherry picking the stuff out to clean the PR |
1261ee4
to
209dac7
Compare
@ApostaC should be good now, no more redundant file changes 😃 |
Great! Will take a look soon! Thanks! |
c355195
to
c2033eb
Compare
src/vllm_router/router.py
Outdated
stats = GetRequestStatsMonitor().get_request_stats(time.time()) | ||
for server, stat in stats.items(): | ||
current_qps.labels(server=server).set(stat.qps) | ||
avg_decoding_length.labels(server=server).set(stat.ttft) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why using TTFT as decoding length here?
src/vllm_router/router.py
Outdated
es = engine_stats[url] | ||
logstr += ( | ||
f" Engine Stats (Dashboard): Running Requests: {es.num_running_requests}, " | ||
f"Queueing Delay (requests): {es.num_queuing_requests}, " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you change this name to "Queueing Delay"?
hi @ApostaC @YuhanLiu11 , I updated the PR with some update related to the metrics and other minor fixes to align with the latest PR that we currently have on |
Thanks for your great contribution! We will take a look soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contirbution.
The Prometheus Metrics endpoint in the router looks good to me. However, could you please leave the other parts of the router unchanged? I've noticed many changes, such as comment modifications and variable redefinitions, that seem unrelated to this PR.
""" | ||
Route the incoming request to the backend server and stream the response | ||
back to the client. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please keep these comments?
in_router_time = time.time() | ||
request_id = str(uuid.uuid4()) | ||
|
||
# TODO (ApostaC): merge two awaits into one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please keep these comments?
src/vllm_router/router.py
Outdated
server_url = GetRoutingLogic().route_request( | ||
endpoints, engine_stats, request_stats, request | ||
) | ||
|
||
logger.info(f"Request {request_id} routed to {server_url}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This information has already been logged in the two lines below.
purpose = "unknown" | ||
else: | ||
purpose = form["purpose"] | ||
purpose = form.get("purpose", "unknown") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to delete the conditional statement ?
src/vllm_router/router.py
Outdated
@@ -181,8 +194,7 @@ async def route_get_file(file_id: str): | |||
@app.get("/v1/files/{file_id}/content") | |||
async def route_get_file_content(file_id: str): | |||
try: | |||
# TODO(gaocegege): Stream the file content with chunks to support | |||
# openai uploads interface. | |||
# TODO: Stream file content in chunks to support large files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please keep these comments?
raise ValueError( | ||
"Static backends must be provided when using static service discovery." | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to delete this conditional statement?
ea69391
to
4cb025f
Compare
hi @ApostaC @YuhanLiu11 @Shaoting-Feng , I updated the code again based on your comment, please help me review it again please, cheers team 😄 |
4cb025f
to
77e09e3
Compare
hey @sitloboi2012 , thanks again for your great contribution! Can you also provide some examples showing the code can run and produce expected output? |
yep sure, let me update the code to include your comment, test case and instruction on how to reproduce the work |
Please resolve the conflicts. |
77e09e3
to
feec809
Compare
hi @YuhanLiu11, to test this code, you can use the following file that has been available. Currently the work that connecting between router + prometheus + grafana is WIP and this PR only focusing on setting the stuff that will be use in the next step as I mentioned in the description of the PR.
If run successfully, you should have some output like this in your terminal:
|
Thanks! Will take a look soon. |
hey @sitloboi2012 I'm able to follow the instructions you gave and get the the expected output. Can you fix the pre-commit issues? (sorry for going back-and-forth for the tests..) and then I can merge your PR. Thanks again! |
…ree vLLM metrics, operational metrics, router observe metrics, update requirement.txt for router and tests, update install-minikube-cluster to be more logging info, restart docker service and minikube context after the run Signed-off-by: sitloboi2012 <[email protected]>
…YuhanLiu11 Signed-off-by: sitloboi2012 <[email protected]>
…service discovery Signed-off-by: sitloboi2012 <[email protected]>
Signed-off-by: sitloboi2012 <[email protected]>
…ree vLLM metrics, operational metrics, router observe metrics, update requirement.txt for router and tests, update install-minikube-cluster to be more logging info, restart docker service and minikube context after the run Signed-off-by: sitloboi2012 <[email protected]>
…YuhanLiu11 Signed-off-by: sitloboi2012 <[email protected]>
Signed-off-by: sitloboi2012 <[email protected]>
Signed-off-by: sitloboi2012 <[email protected]>
Signed-off-by: sitloboi2012 <[email protected]>
Signed-off-by: sitloboi2012 <[email protected]>
Signed-off-by: sitloboi2012 <[email protected]>
…s.py and request_stats.py, update observe/install.sh to helm update to solve the problem already run helm service same name, update dashboard layout with latest metrics, update request_generator to align with the new endpoint /v1 Signed-off-by: sitloboi2012 <[email protected]>
Signed-off-by: sitloboi2012 <[email protected]>
Signed-off-by: sitloboi2012 <[email protected]>
4f190d3
to
d7b7f4b
Compare
Signed-off-by: sitloboi2012 <[email protected]>
hi @YuhanLiu11 just updated, merged and ran pre-commit for formatting and linting, please help me merge and close this PR 😄 thank you team |
The test Functionality test for helm chart / Multiple-Models (pull_request) is flaky. I rerun it several times to pass it. Now I think it is ready to merge. /cc @YuhanLiu11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now. Thanks!
Feature Router observability (Current QPS, router-side queueing delay, etc) Part 1: #78
Due to the complexity and large task, I will split this feature into 2 sub-small PR for easier manage:
Description
This PR is to create 4 main things for the first part of the Router Observability update:
Update vLLM Dashboard Grafana to align with the reference from @YuhanLiu11 combine with add-on. The metrics include
Update
router.py
to tracking information related to the vLLM Dashboard, add on logging for easier trackingUpdate
README.md
file to include exposing port for Prometheus dashboard as wellUpdate
utils/install-minikube-cluster.sh
to include update minikube context, restart docker when setting up the observability, add on logging for easier interpretationUpdate
engine_stats.py
andrequest_stats.py
to align with the metrics informationMinor update:
requirements.txt
fortests
andvllm_router
observe/install.sh
fromhelm install
tohelm upgrade
to solve the problem exist namespace when re-installbase_url
intests/perftest
to align with latest v1 endpointNext step:
router.py
to calculate some customize metricsREADME.md
file to explain metrics and dashboard informationBEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE
PR Checklist (Click to Expand)
Thank you for your contribution to production-stack! Before submitting the pull request, please ensure the PR meets the following criteria. This helps us maintain the code quality and improve the efficiency of the review process.
PR Title and Classification
Please try to classify PRs for easy understanding of the type of changes. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:
[Bugfix]
for bug fixes.[CI/Build]
for build or continuous integration improvements.[Doc]
for documentation fixes and improvements.[Feat]
for new features in the cluster (e.g., autoscaling, disaggregated prefill, etc.).[Router]
for changes to thevllm_router
(e.g., routing algorithm, router observability, etc.).[Misc]
for PRs that do not fit the above categories. Please use this sparingly.Note: If the PR spans more than one category, please include all relevant prefixes.
Code Quality
The PR need to meet the following code quality standards:
pre-commit
to format your code. SeeREADME.md
for installation.DCO and Signed-off-by
When contributing changes to this project, you must agree to the DCO. Commits must include a
Signed-off-by:
header which certifies agreement with the terms of the DCO.Using
-s
withgit commit
will automatically add this header.What to Expect for the Reviews
We aim to address all PRs in a timely manner. If no one reviews your PR within 5 days, please @-mention one of YuhanLiu11
, Shaoting-Feng or ApostaC.